AITopics | text data

Collaborating Authors

text data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Proximal Causal Inference With Text Data

Neural Information Processing SystemsMar-22-2026, 21:18:01 GMT

Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. These approaches, however, assume analysts have supervised labels of the confounders given text for a subset of instances, a constraint that is sometimes infeasible due to data privacy or annotation costs. In this work, we address settings in which an important confounding variable is completely unobserved. We propose a new causal inference method that uses two instances of pre-treatment text data, infers two proxies using two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula. We prove, under certain assumptions about the instances of text and accuracy of the zero-shot predictions, that our method of inferring text-based proxies satisfies identification conditions of the proximal g-formula while other seemingly reasonable proposals do not. To address untestable assumptions associated with our method and the proximal g-formula, we further propose an odds ratio falsification heuristic that flags when to proceed with downstream effect estimation using the inferred proxies. We evaluate our method in synthetic and semi-synthetic settings---the latter with real-world clinical notes from MIMIC-III and open large language models for zero-shot prediction---and find that our method produces estimates with low bias. We believe that this text-based design of proxies allows for the use of proximal causal inference in a wider range of scenarios, particularly those for which obtaining suitable proxies from structured data is difficult.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.59)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

End-To-End Causal Effect Estimation from Unstructured Natural Language Data

Neural Information Processing SystemsMar-21-2026, 13:16:39 GMT

Knowing the effect of an intervention is critical for human decision-making, but current approaches for causal effect estimation rely on manual data collection and structuring, regardless of the causal assumptions.

artificial intelligence, natural language, proceedings, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

ea159dc9788ffac311592613b7f71fbb-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 10:54:39 GMT

phoneme, text data, unlabeled text data, (17 more...)

Neural Information Processing Systems

Country: Europe > Germany > Saxony > Leipzig (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f53a37f820d5be5930415d964f4a0187-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 17:35:22 GMT

machine learning, natural language, proxy, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Government (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Bridge the Modality and Capability Gaps in Vision-Language Model Selection

Neural Information Processing SystemsFeb-11-2026, 13:36:51 GMT

To better reuse the VLM resource and fully leverage its potential on different zero-shot image classification tasks, a promising strategy is selecting appropriate Pre-Trained VLMs from the VLM Zoo, relying solely on the text data of the target dataset without access to the dataset's images.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

1cba8502063fab9df252a63968691768-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-8-2026, 23:02:24 GMT

dataset, participant, video, (16 more...)

Neural Information Processing Systems

Country:

Asia > Nepal (0.05)
Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)

Genre:

Research Report > New Finding (0.94)
Questionnaire & Opinion Survey (0.69)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Information Technology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking

Vendrame, Katia, Yusuf, Bolaji, Kesiraju, Santosh, Sedláček, Šimon, Plchot, Oldřich, Černocký, Jan

arXiv.org Artificial IntelligenceDec-1-2025

End-to-end spoken dialogue state tracking (DST) is made difficult by the tandem of having to handle speech input and data scarcity. Combining speech foundation encoders and large language models has been proposed in recent work as to alleviate some of this difficulty. Although this approach has been shown to result in strong spoken DST models, achieving state-of-the-art performance in realistic multi-turn DST, it struggles to generalize across domains and requires annotated spoken DST training data for each domain of interest. However, collecting such data for every target domain is both costly and difficult. Noting that textual DST data is more easily obtained for various domains, in this work, we propose jointly training on available spoken DST data and written textual data from other domains as a way to achieve cross-domain generalization. We conduct experiments which show the efficacy of our proposed method for getting good cross-domain DST performance without relying on spoken training data from the target domains.

artificial intelligence, natural language, target domain, (15 more...)

arXiv.org Artificial Intelligence

2511.22503

Country: Europe > Czechia (0.47)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

Ma, Yuchen, Frauen, Dennis, Schweisthal, Jonas, Feuerriegel, Stefan

arXiv.org Artificial IntelligenceOct-31-2025

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

confounder, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.02843

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.93)
Health & Medicine > Therapeutic Area > Endocrinology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Cloud Security through Topic Modelling

Saleh, Sabbir M., Madhavji, Nazim, Steinbacher, John

arXiv.org Artificial IntelligenceOct-21-2025

Protecting cloud applications is critical in an era where security threats are increasingly sophisticated and persistent. Continuous Integration and Continuous Deployment (CI/CD) pipelines are particularly vulnerable, making innovative security approaches essential. This research explores the application of Natural Language Processing (NLP) techniques, specifically Topic Modelling, to analyse security-related text data and anticipate potential threats. We focus on Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) to extract meaningful patterns from data sources, including logs, reports, and deployment traces. Using the Gensim framework in Python, these methods categorise log entries into security-relevant topics (e.g., phishing, encryption failures). The identified topics are leveraged to highlight patterns indicative of security issues across CI/CD's continuous stages (build, test, deploy). This approach introduces a semantic layer that supports early vulnerability recognition and contextual understanding of runtime behaviours.

artificial intelligence, ci cd pipeline, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.01463

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Proximal Causal Inference with Text Data

Neural Information Processing SystemsOct-10-2025, 21:42:27 GMT

Data-driven decision making relies on estimating the effect of interventions, i.e. causal effect estimation . For example, a doctor must decide which medicine she will give her patient, ideally the one with the greatest effect on positive outcomes.

experiment, pre 1, proxy, (16 more...)

Neural Information Processing Systems

Country: